Corpus-based Techniques for Word Sense Disambiguation

نویسنده

  • Gina-Anne Levow
چکیده

Consider the task of building a speech-to-speech translation system. One signi cant problem confronting the designer is the absence of a one-to-one mapping from word sounds to text strings to word meanings. The following examples reveal the ubiquity of this problem . In a highly homophonous language like Chinese the single sound sequence 'shi' maps to 56 di erent characters, each of which in turn has at least one meaning. In English, not only are there many text strings with context-dependent pronunciations and meanings (\record": the verb \re-c ord" and the noun \r e-cord"), but there are also many words like \bank" which have only one pronunciation but take on numerous meanings. For example, \bank" can be used as \the bank of a river", \bank account", and \bank a plane". The most extreme form of this ambiguity appears in pronous like \it", which take meaning only by reference to another element of the discourse. These mismatches multiply across languages , where in English the word \sentence" has two meanings, but in French, these meanings must be realized as two di erent words peine, in the criminal sense, and phrase in the grammatical. So, performing dictation,speech recognition, machine translation, or Web-search document retrieval, all require the ability to correctly select word senses.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Insight into Word Sense Disambiguation Techniques

This paper presents various techniques used in the area of Word Sense Disambiguation (WSD). There are a number of techniques such as: Knowledge based approaches, which use the knowledge encoded in Lexical resources; Supervised Machine Leaning methods in which the classifier is made to learn from previously semantically annotated corpus; Unsupervised approaches that form cluster occurrences of w...

متن کامل

رفع ابهام معنایی واژگان مبهم فارسی با مدل موضوعی LDA

Word sense disambiguation is the task of identifying the correct sense for the word in a given context among a finite set of possible sense. In this paper a model for farsi word sense disambiguation is presented. The model use two group of features: first, all word and stop words around target word and topic models as second features. We extract topics from a farsi corpus with Latent Dirichlet ...

متن کامل

Getting Serious About Word Sense Disambiguation

Recent advances in large-scale, broad coverage part-of-speech tagging and syntactic parsing have been achieved in no small part due to the availability of large amounts of online, human-annotated corpora. In this paper, I argue that a large, human sensetagged corpus is also critical as well as necessary to achieve broad coverage, high accuracy word sense disambiguation, where the sense distinct...

متن کامل

Learning Expressive Models for Word Sense Disambiguation

We present a novel approach to the word sense disambiguation problem which makes use of corpus-based evidence combined with background knowledge. Employing an inductive logic programming algorithm, the approach generates expressive disambiguation rules which exploit several knowledge sources and can also model relations between them. The approach is evaluated in two tasks: identification of the...

متن کامل

A Hybrid Relational Approach for Word Sense Disambiguation

We propose a novel approach for word sense disambiguation which makes use of corpus-based evidence combined with background knowledge. Using an inductive logic programming technique, it generates expressive models which exploit several knowledge sources and also the relations between them. The approach is evaluated in two tasks: identification of the correct translation for verbs in English-Por...

متن کامل

Unsupervised Monolingual and Bilingual Word-Sense Disambiguation of Medical Documents using UMLS

This paper describes techniques for unsupervised word sense disambiguation of English and German medical documents using UMLS. We present both monolingual techniques which rely only on the structure of UMLS, and bilingual techniques which also rely on the availability of parallel corpora. The best results are obtained using relations between terms given by UMLS, a method which achieves 74% prec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997